A nomogram to predict severe COVID-19 patients with increased pulmonary lesions in early days

Objectives This study aimed to predict severe coronavirus disease 2019 (COVID-19) progression in patients with increased pneumonia lesions in the early days. A simplified nomogram was developed utilizing artificial intelligence (AI)-based quantified computed tomography (CT). Methods From 17 December 2019 to 20 February 2020, a total of 246 patients were confirmed COVID-19 infected in Jingzhou Central Hospital, Hubei Province, China. Of these patients, 93 were mildly ill and had follow-up examinations in 7 days, and 61 of them had enlarged lesions on CT scans. We collected the neutrophil-to-lymphocyte ratio (NLR) and three quantitative CT features from two examinations within 7 days. The three quantitative CT features of pneumonia lesions, including ground-glass opacity volume (GV), semi-consolidation volume (SV), and consolidation volume (CV), were automatically calculated using AI. Additionally, the variation volumes of the lesions were also computed. Finally, a nomogram was developed using a multivariable logistic regression model. To simplify the model, we classified all the lesion volumes based on quartiles and curve fitting results. Results Among the 93 patients, 61 patients showed enlarged lesions on CT within 7 days, of whom 19 (31.1%) developed any severe illness. The multivariable logistic regression model included age, NLR on the second time, an increase in lesion volume, and changes in SV and CV in 7 days. The personalized prediction nomogram demonstrated strong discrimination in the sample, with an area under curve (AUC) and the receiver operating characteristic curve (ROC) of 0.961 and a 95% confidence interval (CI) of 0.917–1.000. Decision curve analysis illustrated that a nomogram based on quantitative AI was clinically useful. Conclusion The integration of CT quantitative changes, NLR, and age in this model exhibits promising performance in predicting the progression to severe illness in COVID-19 patients with early-stage pneumonia lesions. This comprehensive approach holds the potential to assist clinical decision-making.


Introduction
The infection of coronavirus disease 2019 (COVID- 19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), rapidly became a global pandemic.The virus spread extensively in China and various other regions (1).While current vaccines mitigate the impact of COVID-19, humanity should remain prepared for potential future pandemic threats that could endanger the global population (2).Like other rapidly expanded pneumonia, in most COVID-19 patients, pneumonia lesions were observed to have a rapid progression in 1 week (3).The critically ill patients had a high fatality ratio (4).Thus, it is very important to identify high-risk patients in the early stages to adopt different prevention and treatment efforts.
Pneumonia lesions were observed earlier than clinical symptoms in most COVID-19 patients (4).This indicated that computed tomography (CT) can be used in early diagnosis and monitoring COVID-19 (5).Compared to CT feature score and radiomics, artificial intelligence (AI)-derived quantified lung CT scans offer the advantages of objectivity, ease of accessibility, and time efficiency (6)(7)(8).Currently, many clinical features, including clinical factors and laboratory examinations, have been identified as prognostic risk factors (9).Some published studies indicated that CT features or CT features combined with clinical features outperformed the traditional clinical features alone in prognosis prediction (10)(11)(12).However, most of these studies focused on CT imaging obtained at the time of hospital admission, while neglecting the pre-admission data for COVID-19.Few studies have noticed the potential prognostic value of changes occurring in the short term.However, these studies have not addressed excluding negative changes, like the volume decrease of pneumonia lesions in their analysis (13,14).
Therefore, our study proposes a simple nomogram that has the potential to predict severe COVID-19 patients who have increased pulmonary lesions in the early days of infection.The nomogram is designed to use quantitative lung CT data to evaluate the severity of lung damage in COVID-19 patients.

Patients
A total of 246 patients were diagnosed with COVID-19 infection by SARS-CoV-2 nucleic acid test in Jingzhou Central Hospital between 17 December 2019 and 20 February 2020.The patients who had been treated in another hospital for more than 7 days (n = 124) were diagnosed severely ill at the time of the first chest CT scan (n = 24), and missed follow-up chest CT scans in early 7 days (n = 5) were excluded in this study (Figure 1).Among the remaining 93 patients, 61 exhibited enlarged pneumonia lesions, with an enlargement range exceeding 10 cm 3 .Then, all 61 patients were followed up for 30 days.At last, 42 showed no progression to severe illness and 19 exhibited progression to severe illness.
According to the guidelines of Chinese national diagnosis and treatment protocols for COVID-19 (15) and the guidelines of the American Thoracic Society (16), patients were defined as having severe illness if they reached any of the following endpoints: (a) multilobar infiltrates and respiratory failure (arterial oxygen pressure (PaO2) < 60 mmHg, accompanied or unaccompanied with PaCO2 > 50 mmHg); (b) mechanical ventilation, intensive medical care or extracorporeal membrane oxygenation (ECMO) treatment (c); organ damage; and (d) in-hospital mortality.In this study, organ damage include ARDS (according to the Berlin definition), acute cardiac injury (heart function graduation ≥ IV levels by New York Heart Association heart function rating), acute kidney injury (AKI, according to the Kidney Disease Improving Global Outcomes clinical practice guidelines), and liver dysfunction alanine (aminotransferase >5 times upper limit of normal) (7,13).

Clinical data and CT examination collection
We collected the demographic, clinical, laboratory, and outcome data from electronic medical records in the hospital information system (HIS).The baseline characteristics, including sex, age, and comorbidities, were collected.In this study, the comorbidities include diabetes, hypertension, cardiovascular disease, cerebrovascular disease, chronic obstructive pulmonary disease (COPD), chronic hepatitis B infection, and malignant tumors.The examinations of initial and follow-up in 7 days include neutrophil-to-lymphocyte ratio (NLR) and C-reactive protein (CRP).Meanwhile, the changes between the two examinations were depicted.
Chest CT imaging, including initial and follow-up in 7 days, was acquired from the hospital Picture Archiving and Communication System (PACS).If examinations were collected from outpatient and inpatient departments, they should be matched by name, age, exam time, exam number, and ID card number (if applicable) to verify they belong to the same individual.
All CT examinations were performed using a 16-slice scanner (Siemens SOMATOM Emotion, Germany) without contrast injection when the patients held their breath.The position was supine with the head advanced.Imaging parameters were set as follows: 130 kV; automatic tube current; slice width, 5 mm; beam collimation, 1.2 mm; tube rotation time, 0.6 s; reconstructed slice thickness, 1.5 mm; and matrix, 512 × 512.

CT image review and AI-based quantization
Computed tomography images were automatically analyzed using an AI system (YT-CT-Lung, YITU Healthcare Technology Co., Ltd., China).This AI system utilized a fully convolutional network along with adaptive thresholding and morphological operations for the segmentation of lungs and pneumonia lesions (17,18).Two experienced radiologists, each with over 10 years of experience, independently reviewed the CT images to ensure concordance with the AI-based segmentations corresponding to the lesion range in the CT scan.Illustrative examples of the automatically segmented pulmonary tissue and pneumonia lesions from COVID-19 are shown in Figure 2.
Besides the total volume of pneumonia lesions, other three quantitative features were computed by thresholding on CT values.The CT value range of −750 HU to −500 HU, −500 HU to −200HU, and −200HU to 60HU was defined as ground-glass opacity volume (GV), semi-consolidation volume (SV), and consolidation volume (CV), respectively.In this study, semi-consolidation refers to an intermediate, homogeneous area where there is a noticeable increase in density (19).

Statistical analysis
Patients were divided into two groups according to the results of follow-up, the severe group and the non-severe group.For only one NLR result that was not collected in this study, we used the median value to replace it.During the initial examination, five (8.2%)CRP results were missed, and three were missed during follow-up (4.9%).To address this issue of missing data, we conducted a chained equation approach method with five replications for multiple imputations using the R MI procedure.
We presented continuous variables as the median and interquartile range (IQR), while categorical variables as numbers and percentages.The Wilcoxon rank-sum test and chi-square tests were applied for relevant comparisons between severe and non-severe groups.
Binary logistic regression was used to explore the association between the predictive features and the development of severe illness.We performed a selection process by excluding all variables with p-values of ≥0.05 in the univariate regression analysis.Subsequently, we analyzed the remaining variables using multivariable regression.To simplify the model, we classified all the continuous variables based on quartiles and curve fitting results.Finally, we constructed a mixed nomogram model based on regression coefficients.The calibration of this model was evaluated through bootstrapping with 200 resamples to produce calibration curves and the Hosmer-Lemeshow goodnessof-fit test.The discrimination of the nomogram model was assessed by the receiver operating characteristic curve (ROC) and the area under the curve (AUC).The AUC in the prediction models was compared using the Delong non-parametric approach.The clinical benefits of the model were determined using decision curve analysis (DCA).
Two-tailed tests were conducted, considering p values of <0.05 as statistically significant.Statistical analyses were carried out using R version 3.3.2(The R Foundation, https://www.r-project.org/) and the Free Statistics software package version 1.7.

Clinical and CT quantitative characteristics
Of the total 93 COVID-19 patients, 32 presented reduced pneumonia lesions or increasing volume of less than 10 cm 3 , and 61 observed increased lesions in follow-up lung CT.There were no statistical differences in age, gender, or comorbidity between them.However, the progression to severe illness was statistically different, with a p value of 0.006 (Supplementary Table 1).
We classified 61 patients with lung lesions greater than 10 cm 3 into two groups based on disease severity of follow-up: non-severe (n = 42) and severe (n = 19).Severe patients (54.0 years, IQR 42.0-60.0)were older compared to non-severe patients (38.5 years, IQR 32.2-49.8,p = 0.014).Although there was a higher proportion of males in the severe group compared to the non-severe group, this difference did not reach statistical significance (55.7 vs. 44.3%,p = 0.18).The presence of comorbidities was significantly associated with a worse outcome (p = 0.04).There was no significant difference in the duration between initial examinations and follow-up between the two groups.
In initial examinations, there were no significant differences for all clinic and CT features between severe and non-severe patients.However, in follow-up, NLR (p = 0.032) and CPR (p = 0.027) were significantly higher in the severe patients than in the non-severe patients.About CT features, except GV, the other three (lesion volume, CV, and SV) were significantly higher in the severe group than in the non-severe group (p < 0.05).Regarding the changes in 7 days, the range of increased lesion volume was from 10.71 to 958.69 cm 3 .All CT features, except GV, exhibited a more pronounced increase in the severe group, whereas the clinical biomarkers (NLR and CPR) did not exhibit any significant changes (Table 1).

Relationships between clinical and CT quantitative characteristics and severe illness
The results of the univariate logistic regression and multivariable logistic regression are presented in Table 2.In follow-up, NLR, SV, and CV were significantly associated with progression to the severe illness of COVID-19 patients with increased pneumonia lesions in the early days (p < 0.05).However, calculating the change in 7 days, only CT characteristics (lesion volume, SV, and GV) exhibited a significant association with severe illness when assessing changes over 7 days.Notably, there were minimal disparities in the results between the univariate and multivariable logistic regression analyses.

Development and performance of prediction model
In our study, the initial examination NLR and changes over 7 days were not significantly associated with progression to severe illness in COVID-19 patients with escalating pneumonia lesions during the early days.However, NLR in the follow-up examination exhibited a significant relationship.According to the clinical experiments, we added it to the mixed model.Finally, age, NLR in the second blood test, and changes in quantitative CT features (lesion volume, SV, and CV) between the two examinations were selected based on the multivariable logistic regression analysis.Combining quartiles and curve fitting results, a mixed model was developed and presented as a nomogram (Figure 3).The model showed the highest discrimination between severe COVID-19 patients and no-severe patients in those who had increased lung lesions in the early days, with an AUC of 96.05% (91.66-100%).The sensitivity, specificity, and accuracy of the model were 94.74%, 90.48%, and 0.918, respectively.
The calibration curve of the nomogram showed strong agreement between the severe illness predicted and the mixed model (Figure 3C).The Hosmer-Lemeshow test resulted in a p value of 0.666, indicating a good fit for the nomogram.The bootstrap validation technique yielded a p value of 0.854, confirming the model's acceptable performance.
Figure 3D illustrates the DCA, demonstrating that the use of a nomogram in predicting severe illness provided a higher net benefit compared to both the "treat all" and "treat none" strategies across threshold probabilities ranging from 1 to 81% in the cohort.This highlights the clinical utility of the nomogram.
There was a significant difference in predictive efficacy between the mixed model with categorical variables and with continuous variables (p = 0.009) (Supplementary Figure 1).

Discussion
In our study, we developed a quantitative CT feature-based model that incorporates age, and NLR from follow-up examinations within a 7-day period.This model could predict the risk of progressing to severe illness in COVID-19 patients who had an increase in We investigated to evaluate the potential features for predicting severe illness.This assessment uses three data points: the examination data from both outpatient and inpatient departments when the patients arrive at the hospital for the first time, a second data point obtained within a maximum interval of 7 days from the first examinations, and the changes observed between two points.Our study has extended the observation period of the disease in comparison to previous studies that only utilized admission data (13).
Given their previously reported prognostic potential (20-22) and the feasibility of routine blood analysis, we selected NLR and CPR as representative laboratory biomarkers for comparison in this study.We observed improved performance of NLR and quantitative CT features in the second examination when the interval between the two inspections was less than 7 days.This observation indicated that the dynamic trends in NLR and CT manifestation changes are highly valuable in predicting adverse outcomes of COVID-19 in the early stages of the disease, and this may also apply to other viral pneumonia.The prevalence of severe COVID-19 in our cohorts was approximately 31.15%, exceeding the rates reported in contemporaneous publications (10,14).Conversely, in patients without enlarged pneumonia lesions (n = 32), the rate of severe illness was 6.25%, which was lower than reported.The difference was statistically significant (p = 0.006) in our cohort.This indicated that patients with increased lung lesions in the early days are more likely to progress severe illness than those with stable lung lesions.It is worth noting that there were no differences observed in changes in CPR and NLR between the two groups (Supplementary Table 1), suggesting that CT exhibits higher sensitivity than CPR and NLR in displaying the development of COVID-19.
In our study, the nomogram demonstrated excellent performance with an AUC of 0.961 and exhibited good calibration.Notably, a significant difference in predictive efficacy was presented between the mixed model incorporating categorical variables and the one incorporating continuous variables.This difference may arise due to variations in either the development patterns of lesions or the stage of COVID-19 pneumonia in patients.
In the early stage, GGO is the most common CT finding and usually develops rapidly (23,24).Consolidation had also been observed, enlarging in some cases (1).In the field of pathology, GGO indicates interstitial thickening or alveolar damage, with airspaces being partially filled with inflammatory exudation (25).Consolidation, on the other hand, may be attributed to the presence of cellular fibromyxoid exudates in the alveoli (26).Disruption of the basement membrane and activation of fibroblasts increase the risk of patients progressing to ARDS.Pathologically, moderate to severe COVID-19 pneumonia is characterized by extensive mixed-to-consolidationdominant lesions.These lesions are indicative of diffuse alveolar  (27).Logistic regression analysis revealed that the changing volume of SV and CV was an independent prognostic factor, while GV did not show a significant association with prognosis.These findings align with the observed pathological changes.
In our study, the volume of lung lesions between the two groups had no statistical difference (p = 0.907), and the relationship between the volume of lung lesions in the initial CT scan and the severity of the illness was non-linearity (p = 0.038) (Figure 4).This indicated that development patterns, but not the initial volume of lung lesions, affected the progress of the disease most.
To assess the models beyond mathematical performance metrics such as AUC, we used DCA to evaluate the predicted net benefit across various risk thresholds and examine the effect of different thresholds (28).DCA demonstrated that within the range of threshold probabilities from 0.01 to 0.81, the implementation of the nomogram in our study for predicting COVID-19 pneumonia yielded greater benefits compared to the "treat all" or "treat none" strategies.
Our study had a few limitations.First, due to the retrospective study design, the evaluation of patients who underwent serial CT scans was conducted.These scans were not based on a predefined protocol but rather ordered according to clinical necessity, which introduces a potential selection bias.Second, this study was conducted at a single center with a small sample size.The challenges posed by the ongoing mutation of the virus and the global pandemic have made it difficult to organize prospective multicenter trials specifically targeting patients infected with the original strain of SARS-CoV-2.To ensure model stability, we conducted internal verification through 200 iterations of bootstrap resampling.Third, there are some missing data in our study.Due to the insufficient availability of inspections, not all CRP data could be collected.Additionally, one patient's NLR data were missing as they had their examination conducted in another hospital, which was not recorded in the HIS.However, we mitigated this issue (CRP) by using multiple imputations and replacing the missing value (NLR) with the median value.Moreover, CRP was not included in the nomogram analysis.Finally, all CT images analyzed in this study were obtained from the same CT scanner (Siemens SOMATOM Emotion, Germany).To validate and confirm the findings of this research, further investigations utilizing data from multiple sites and diverse scanners are warranted.

Conclusion
In summary, we have developed a quantitative prediction model that can assess the risk of patients.This model has the potential to guide the systematic treatment approach for admitted COVID-19 pneumonia patients in the early stages and aid in reducing the strain on healthcare resources and improving patient outcomes.COVID-19 pneumonia has many similarities with other types of viral infection and organized pneumonia in imaging and pathology (27).This result may serve as a reference to other viral infections in future.

FIGURE 1 Flow
FIGURE 1Flow diagram of the study population.

FIGURE 2 COVID- 19
FIGURE 2 COVID-19 pneumonia lesions detected by the AI system and visualized as pseudo colors.First rows: initial CT images; second rows: AI autosegmentation displayed with blue pink, and red pseudo colors representing ground-glass opacity (GGO), semi-consolidation and consolidation, respectively.(A,B) First CT imaging (A) and fellow-up after 3 days (B) of a 56-year-old male patient, who did not meet the endpoint during the followup.(C,D) First CT imaging (C) and fellow-up after 4 days (D) of a 80-year-old female patient, who reached the endpoint of progression to severe illness and died after 13 days.

FIGURE 3
FIGURE 3 Development and performance of nomogram.A nomogram (A) for the prediction of developing severe illness of COVID-19 patients.ROC curves (B) of nomogram.Calibration curves of the nomogram (C), which depict the calibration of the nomogram in terms of the agreement between the predicted risk of severe illness and observed outcomes.The 45°dotted blue line represents a perfect prediction, the dotted red lines represent the predictive performance of the nomogram, and the blue lines represent the bootstrap validation.The closer the dotted red and blue line fit is to the ideal line, the better the predictive accuracy of the nomogram is.Decision curve analysis (DCA) for the nomogram (D), the y-axis represents the net benefit.The blue line represents the AI mixed nomogram.The gray line represents the hypothesis that all patients had developing severe illness.The black line represents the hypothesis that all patients had no progression to severe illness.The x-axis represents the threshold probability.

TABLE 1
Clinical and CT quantitative characteristics of COVID-19 patients with enlarged lung lesions.Five missing replaced using multiple imputations; b One missing replaced by median value; c Three missing replaced using multiple imputations.CRP, C-reactive protein; NLR, Neutrophil-tolymphocyte ratio; GV, Ground-glass opacity volume; SV, semi-consolidation volume; CV, consolidation volume. a

TABLE 2
Logistic regression analysis of COVID-19 patients with enlarged lung lesions.